18 research outputs found
NFFT meets Krylov methods: Fast matrix-vector products for the graph Laplacian of fully connected networks
The graph Laplacian is a standard tool in data science, machine learning, and
image processing. The corresponding matrix inherits the complex structure of
the underlying network and is in certain applications densely populated. This
makes computations, in particular matrix-vector products, with the graph
Laplacian a hard task. A typical application is the computation of a number of
its eigenvalues and eigenvectors. Standard methods become infeasible as the
number of nodes in the graph is too large. We propose the use of the fast
summation based on the nonequispaced fast Fourier transform (NFFT) to perform
the dense matrix-vector product with the graph Laplacian fast without ever
forming the whole matrix. The enormous flexibility of the NFFT algorithm allows
us to embed the accelerated multiplication into Lanczos-based eigenvalues
routines or iterative linear system solvers and even consider other than the
standard Gaussian kernels. We illustrate the feasibility of our approach on a
number of test problems from image segmentation to semi-supervised learning
based on graph-based PDEs. In particular, we compare our approach with the
Nystr\"om method. Moreover, we present and test an enhanced, hybrid version of
the Nystr\"om method, which internally uses the NFFT.Comment: 28 pages, 9 figure
Run-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFS
In this paper we present a solution where only one FPGA is needed in a host coupled system, in which the FPGA can be reconfigured by a user application during run-time without loosing the host link connection. A hardware infrastructure on the FPGA and the software framework ACCFS (ACCelerator File System) on the host system is provided to the user which allow easy handling of reconfiguration and communication between the host and the FPGA. Such a system can be used for offloading compute kernels on the FPGA in high performance computing or exchanging functionality in highly available systems during run-time without loosing the host link during reconfiguration. The implementation was done for a HyperTransport coupled FPGA. The design of a HyperTransport cave was extended in such a way that it provides an infrastructure for run-time reconfigurable (RTR) modules
OpenMP parallelization in the NFFT software library
software library and present the used parallelization approaches. Besides the NFFT kernel, the NFFT on the two-sphere and the fast summation based on NFFT are also parallelized. Thereby, the parallelization is based on OpenMP and the multi-threaded FFTW library. Furthermore, benchmarks for various cases are performed. The results show that an efficiency higher than 0.50 and up to 0.79 can still be achieved at 12 threads. 1 Overview The NFFT3 library [3] and its MATLAB interface were parallelized using OpenMP [7]. Both the non-parallel version and the multi-thread OpenMP version of the NFFT3 library provide an identical Application Programming Interface (API). This is realized by using distinct library files for both versions. The non-parallel version of the NFFT3 library can be found in libnfft3.so and libnfft3.a, the multi-thread OpenMP version in libnfft3_threads.so and libnfft3_threads.a. For the MATLAB interface, the user has to specifiy at compile time whether the non-parallel or multi-thread OpenMP version should be built. The following kernels of the NFFT3 library were parallelized using OpenMP: • kernel/nfft: – NDFT (nonequidistant discrete Fourier transform